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ABSTRACT 

Objective: The use of electronic medical record 
(EMR) data is necessary to improve clinical research 
efficiency. However, it is not easy to identify patients 
who meet research eligibility criteria and collect the 
necessary information from EMRs because the data 
collection process must integrate various techniques, 
including the development of a data warehouse and 
translation of eligibility criteria into computable criteria. 
This research aimed to demonstrate an electronic 
medical records retrieval system (ERS) and an example 
of a hospital-based cohort study that identified both 
patients and exposure with an ERS. We also evaluated 
the feasibility and usefulness of the method. 
Design: The system was developed and evaluated. 
Participants: In total, 800 000 cases of clinical 
information stored in EMRs at our hospital were used. 
Primary and secondary outcome measures: The 
feasibility and usefulness of the ERS, the method to 
convert text from eligible criteria to computable criteria, 
and a confirmation method to increase research data 
accuracy. 

Results: To comprehensively and efficiently collect 
information from patients participating in clinical 
research, we developed an ERS. To create the ERS 
database, we designed a multidimensional data model 
optimised for patient identification. We also devised 
practical methods to translate narrative eligibility 
criteria into computable parameters. We applied the 
system to an actual hospital-based cohort study 
performed at our hospital and converted the test 
results into computable criteria. Based on this 
information, we identified eligible patients and 
extracted data necessary for confirmation by our 
investigators and for statistical analyses with our ERS. 
Conclusions: We propose a pragmatic methodology 
to identify patients from EMRs who meet clinical 
research eligibility criteria. Our ERS allowed for the 
efficient collection of information on the eligibility of a 
given patient, reduced the labour required from the 
investigators and improved the reliability of the 
results. 



ARTICLE SUMMARY 



Article focus 

■ The focus of this work was to establish a prag- 
matic methodology to efficiently collect informa- 
tion from electronic medical records (EMRs) 
about patients who meet clinical research eligibil- 
ity criteria. 

Key messages 

■ The use of EMR data is necessary to improve clin 
ical research efficiency. However, it is not easy to 
identify patients who meet research eligibility 
criteria and collect necessary data from EMRs 
because the data collection process must integrate 
various techniques, including the development of a 
data warehouse and the translation of eligibility cri 
teria into computable criteria. An efficient ERS and 
a standardised data processing model that inte- 
grates these techniques are essential to facilitate 
clinical research that utilises EMRs. 

Strengths and limitations of this study 

■ Our method uses a specialised data model for 
patient identification in clinical research and effi 
cient data conversion that does not depend on 
the EMR database structure when converting 
narrative criteria to computable criteria. 

■ We propose that computable criteria should not 
be a result of the automated conversion of narra- 
tive criteria but rather a result of research prepar- 
ation involving medical concepts that are not 
expressed logically or explicitly in the narrative 
criteria. Therefore a large amount of the conver- 
sion of the eligibility criteria to computable 
criteria should be executed at the protocol devel- 
opment stage. 

It is important to further discuss protocol stand- 
ardisation, including eligibility criteria representa- 
tion for computable use. 

Enabling medical records retrieval system use in 
and across multiple institutions is an important 
future task. 
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BACKGROUND 

Medical information technology has recently advanced 
in many countries, and enormous amounts of clinical 
data are already stored as electronic medical records 
(EMRs). Utilising the data collected in EMRs is neces- 
sary to improve clinical research efficiency. 1-3 An EMR is 
a large database of patient data and is used in observa- 
tional research to investigate the relationships among 
diseases, treatments and outcomes, 4-7 to conduct surveil- 
lance for rare drug reactions, 4 8 and to recruit patients 
for clinical trials. 9-13 However, it is not easy to identify 
patients who meet research eligibility criteria and collect 
necessary information from EMRs. 2 3 Herein, we 
describe three major issues concerning EMR-based 
observational studies: EMR patient data retrieval func- 
tion, eligibility criteria protocol representation and EMR 
data accuracy. 

To identify patients who meet research eligibility cri- 
teria, it is necessary to obtain various types of informa- 
tion stored in EMRs by subject, for example, diagnosis 
and prescribed medications. However, the EMR database 
is designed to facilitate online transaction processing for 
rapid and detail-oriented clinical information searches 
on individual patients, and the current EMR system does 
not facilitate this retrieval function. 2 3 14 Data ware- 
houses are essential components of data-driven decision 
support. To allow for efficient research analyses, EMR 
data must first be warehoused to enable data analyses 
across patient populations. 15-21 However, healthcare 
data modelling is difficult and time consuming because 
of the complexity of the medical knowledge involved. 
Thus, the most common approaches to clinical data 
warehouse modelling are variations on the entity- 
attribute-value (EJN) model, 22-28 where data are stored 
in a single table with three columns: entity identifica- 
tion, attribute and attribute value. The E^ design has 
advantages, including flexibility and ease of storage; 
however, it requires transforming EJW data into another 
analytical format before analysis. 25 28 Online analytical 
processing (OLAP) is most frequently used for searching 
data stored in the data warehouse. 29-31 OLAP systems 
in relational databases are typically designed based on 
Kimball's star schema. 32 However, the star schema was 
devised to facilitate online measurement analyses. In 
healthcare, this method can be used to dynamically 
gather online analyses of numeric data (eg, a specific 
dose of a drug for a specific disease) in clinical practice. 
Therefore, this method is not suitable for identifying 
patients who meet the complicated eligibility criteria for 
a given clinical research study. Data-modelling methods 
that facilitate the identification of patients and enable 
the collection of necessary information from EMRs 
remain to be established. 28 

Current eligibility criteria are written in a text format 
that cannot be computationally processed. Additionally, 
to be applied in actual EMR, eligible criteria need to 
be integrated with the data model of EMRs. 33 Several 
investigations have sought to establish computable 



eligibility criteria. 34-41 However, there is no consensus 
regarding a standard patient information model, 33 and 
the eligibility criteria are not yet completely standardised. 
Using natural language processing (NLP) technologies to 
convert the text format of eligibility criteria to a computer 
or to extract patient identifications from EMRs is far from 
perfect without human intervention. 3 42 43 

Current EMRs have been used to support claims 
for medical service fees and the treatments administered 
to each patient; therefore, data gathered specifically 
for research purposes may be incomplete and 
unreliable. 2 3 44 

Although various investigations on each technique are 
executed individually, standardised methods must still 
be established that integrate these techniques, facilitate 
the identification of patients who are eligible for clinical 
research, and collect necessary information from EMRs. 

OBJECTIVE 

We designed a pragmatic data processing model opti- 
mised for patient identification and for the collection of 
necessary information from EMRs for clinical research. 
These tools are implemented as an electronic medical 
records retrieval system (ERS) . 44 

This research aimed to demonstrate an ERS and an 
example of a hospital-based cohort study that used the 
ERS to identify both patients and exposure. Another 
aim was to evaluate the feasibility and usefulness of the 
ERS, the method to convert text form eligible criteria 
to computable criteria, and a confirmation method to 
increase research data accuracy. 

MATERIALS AND METHODS 

Outline of our procedure for patient identification and data 
collection from the EMR 

To identify patients who met the eligibility criteria for 
the clinical research in question, data were collected in 
the following ways: 

1. The text form of the narrative criteria was converted 
into computable criteria. 

2. A targeted patient list was created. 

3. A flag was added for investigators to confirm the tar- 
geted patient list. 

4. Reports were created for the investigators to confirm. 

5. After confirmation by the investigator, the statistical 
analyses were executed. 

EMR retrieval system 

In our hospital, EMR use was introduced in 2005; 
approximately 800 000 cases of clinical information have 
already been stored. To comprehensively and efficiently 
collect information about patients participating in clin- 
ical research, we developed an ERS. 44 

EMRs store various types of information, integrating 
billing, pharmacy, radiology, laboratory information and 
others 4 In creating the ERS database, we designed a 
new data model based on the star schema that was 
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optimised for patient identification in clinical research. 
We identified nine data categories from EMRs that are 
useful for clinical research: demographic characteristics, 
physical findings, diagnostic studies, laboratory tests, 
diagnoses, progress reports on an EMR template, 44 45 
medications and injections, operation records and other 
treatments. We then designated these categories to 
'entities'. In our hospital, the diagnosis is managed by 
codes that were originally defined by our hospital and 
mapped with International Statistical Classification of 
Diseases (ICD) 10 codes 46 for medical insurance pur- 
poses. Operations codes were also managed by codes 
that originally were defined by our hospital and mapped 
with ICD-9 Clinical Modification codes. We identified 
available columns (eg, ICD code, diagnosis date) from 
the EMR data model and designated these columns as 
'attributes' of the entities. 

Figure 1 presents our data model. In our model, all 
entities in a given schema are independent and com- 
plete; this allows for logical operations and for the 



creation of eligible patient lists for each respective par- 
ameter in a study. The target patient list is generated by 
combining these patient lists. The data model also sup- 
ports the inference of medical concepts expressed in the 
eligibility criteria in reference to corresponding patient 
data accumulated in EMRs. 33 34 

In our hospital, a replicate of the EMR database 
known as 'Open DB' was established for the secondary 
use of accumulated EMR data. 7 A data mart for our ERS 
was created to ensure that the data retrieval process was 
practical and independent of the EMR system structure; 
the data mart was created on the relational database 
management system by extracting, transforming and 
loading (ETL) information from the Open DB. 7 44 The 
ETL process is performed automatically once nightly 
except for the 'Progress notes by EMR template' entity, 
which is referred directly from the Open DB to ensure 
real-time visibility for the eClinical trial. 44 

An OLAP tool was installed to efficiently search 
through data from multiple patients. 44 The OLAP tool 
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Figure 1 Data model for our electronic medical record retrieval system. 
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runs in an Internet browser and can generate structured 
query language (SQL) based on predefined metadata 
(ie, a data model) by denning logical queries (ie, pro- 
grammes) using a graphical user interface (GUI). 
Moreover, this tool allows reports on information 
retrieved from the browser to be transcribed using 
hypertext markup language (HTML). The reports are 
created in various formats, including portable document 
format (PDF), comma-separated values (CSV) and 
extensible markup language (XML). 44 

To protect personal information in medical records at 
our hospital, the EMR network is separated physically 
from other networks. Our data mart and OLAP servers 
are deployed in the same EMR network and managed 
using the same EMR security policies. Additionally, the 
use of our ERS is limited to clinical research approved 
by the ethics committee at our hospital, and only desig- 
nated staff members at our centre are allowed to retrieve 
data. Our centre creates and manages ERS user identifi- 
cation separate from the EMRs. For the external output 
of CSV and other data, permission must be obtained 
from our department of medical informatics, and data 
extraction must be executed in the presence of supervi- 
sors who are responsible for protecting personal infor- 
mation at our hospital. 



Application to clinical research 

We applied the system to a hospital-based cohort study 
performed at our hospital titled 'Risk of osteomyelitis of 
the jaw induced by oral bisphosphonates (BP) in 
patients taking medications for osteoporosis: a hospital- 
based cohort study in Japan', 47 in which we identified 
eligible patients, extracted research data and evaluated 
the feasibility of our system. The ethics committee at 
Kyoto University Hospital approved this research. A dif- 
ferent paper details the purpose, methods, results and 
discussion of this research. 47 

This research aimed to estimate the risks for osteomye- 
litis of the jaw in osteoporosis patients at our hospital 
who had been exposed to oral BP compared with those 
who had not 48 49 

The eligibility criteria were as follows: 

Inclusion criteria: 

► Patients diagnosed with osteoporosis and treated with 
osteoporosis medications at Kyoto University Hospital 
between November 2000 and October 2010. 

► Patients aged 20 years or older. 
Exclusion criteria: 

► Patients with a history of treatment with radiation 
therapy to the maxillofacial region. 

► Patients with primary or metastatic tumours in the 
maxillofacial region. 

► Patients treated with intravenous BP. 

The data collected were diagnosis, date of diagnosis, 
sex, birthdate and the doses and dates when osteopor- 
osis medications, steroids, anticancer drugs, diabetes 
drugs and HbAlc tests were administered. 



Conversion of the text form of the narrative criteria 
to computable criteria 

To identify eligible patients and collect the necessary 
data from the EMRs, narrative criteria and data must be 
converted to computable criteria. Such computable cri- 
teria include entities, attributes, logical operators 
(ie, 'and' and 'or'), codes and parameters. 33-37 The clin- 
ical research purpose and clinical practice demands 
made it necessary to perform this task. 

We manually executed the conversion from text eligi- 
bility criteria to computable criteria. As an example of 
the conversion from narrative criteria to computable cri- 
teria, we present the following two-step conversion 
procedure: 

Step 1: Convert the narrative criteria into entity-level criteria 

Medical concepts expressed as narrative criteria are 
mapped onto entities in the data model and converted 
into entity-level criteria. This task is manually performed 
at the protocol development stage of the study by the 
investigators. For each entity, a criterion is created to 
extract patients who meet each condition. If exclusive 
conditions for the same entity must be defined, a differ- 
ent criterion is created. Additionally, the list of codes for 
drugs and diagnoses (ie, ICD-10) is created, and the 
period of treatments and others are defined by investiga- 
tors. In this study, we mapped 'osteoporotic patients' 
onto two entities (ie, 'diagnosis' and 'medications and 
injections') and converted it to a combination of two cri- 
teria (ie, 'diagnosis of osteoporosis' and 'osteoporosis 
drug administration'). In the test research, we defined 
the entity-level criteria according to the entered diagno- 
sis and ordered treatments rather than the diagnostic 
criteria of the disease. This process reflects that the test 
research aimed to estimate some risks of osteomyelitis of 
the jaw with BP administration instead of diagnosing 
osteoporosis patients accurately. The recorded diagnosis 
in the EMR was typically designed to ensure payment for 
medical claims. We thus sought to reduce the number of 
false-positives by extracting patients with a given treat- 
ment type. 

Step 2: Convert entity-level criteria into attribute-level criteria 
(ie, computable criteria) 

The abovementioned corresponding codes, date and para- 
meters are mapped onto attributes of the entity-level cri- 
teria, and these factors become computable criteria. 

Creating a targeted patient list 

A targeted patient list is created from the entire set of 
patients for whom EMRs have been obtained by defin- 
ing logical queries (ie, programmes defined by the GUI) 
based on the computable criteria included in the ERS. 

Logical queries are first defined in the ERS to identify 
patients who meet the conditions for each criterion. 
The ERS automatically generates the SQL necessary for 
data extraction according to the logical queries. Logical 
queries are then defined to include or exclude eligible 
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Create Vrew_PatientsList as 



Select Patient Id F rom <tje mog raphj^a 
Where 

a. Paticz-ntidvTrLS 



Select Patient Id From Diagnose 

Where ICDIOCode in (osteoporosis ICD10 code list) and 

Diagnosis Date >= '10/01/2000' and DiagnosisDate <= '09/30/2010' and 
Suspected Flag = 'Fixed ' ) 



and 

a . Patientld(jnj[ 



Select Patient Id From MedicationsAnd Injections 
Where DrugCode in (osteoporosis drugs code list} and 
ExecuteDate >= '10/01/2000' and ExecuteDate <= '09/30/2010' ) 



anbl 
a. Patientld< 



Select Pat ient Id From MedicationsAnd inject ions 
Where DrugCode in (intravenous BP drug code list) and 
ExecuteDate >= '10/01/2000' and ExecuteDate <= '09/30/2010' ) 



Figure 2 Example structured query language (SQL) to 
create the target patient list. 



SeJect Patientld. jbr.nl BP nr m h r^tion^ , {Pfrrpm View_PatientsList a 
where ft, Patient i d (Q[ 



Select Patremld from fVicchcationiAud nice l unj 
Whe^e DrugCode in( or^l BP d>ugs code list) and 
ExecuteDate ^ UD/Dl/ZOOtf and ExecuteDate 1= '09/30/2010 ' } 



(^Onioriiir^ 

Select Patientld, p m mp.i ;\ m . -,. rj t iom4 @P nom View.Patienlsllst a 
Where a. PatientEd(^ohn)[ 



Select Patienttd From MedicationsAnd Injections 
Where DrugCode in Loral BP drugs cr>de hit) and 

ExecuteDate >= 'la/Pi/zooo'and Execute Date <= '09/3Q/2Q1D'|i 



SeJKt PatfentWi \ n H j it ir §j a v ■:.>;,: :\ \\ , ■;. n rj .^ra^ /T^ rom Vrew_Par.ient5list * 

W here a, Pat)entM(in)| 

Select Pntic-nTId From Dia£neisi<L 

Where iCDLDCodc in Cinfiammatorv conditions of jaws iCDlOcotfe list) and 

DlagrwflsDate >= '10/01/2000' and DiagnosisDate <= '09/30/2010' and Suspected Flag = 'Fixed') 



<^Dniorijrr^ _ 

Select Patientldj Inflammatory Jaw condition diagnosisf ^rom Vhew_Patientslist a 
Where a. PstientidfjicTTn) 



Select PatientEd Fram Diagnosis 
Where ICDlOCode In (Inflammatory conditions of |aws ICDlOcode list) and 

DiagnDsisDatE >- ' 10/0 1/20DQ' and DiagriDSisDatE *~ '03/30/2010' and Suspected Flag ? 



' Fixed' ) 



Figure 3 Example structured query language (SQL) to flag 
the target patient report for investigator confirmation. 



patients who meet each criterion for the demographic 
entity. The targeted patient list is created by executing 
the logical query. Figure 2 presents an example of an 
SQL automatically generated by the ERS. 

We thus designed our data model to enable the cre- 
ation of a targeted patient list by defining the patients 
extracted from each criterion (ie, 'in' or 'not in') as con- 
ditions for the demographic entity that was the unique 
patient list for the entire hospital. If logical queries are 
defined using our method, even if the eligibility criteria 
are complicated, it is not necessary to dramatically 
change the SQL structure generated in the ERS. 

Flagging entries for investigators to confirm 

To improve research data accuracy, confirmation by 
the investigators is necessary. When confirmation is 
required, additional information is linked. 

For the targeted patient list, logical queries are 
defined to flag certain items according to the investiga- 
tors' interest. Necessary logical queries are first defined 
for each criterion. Logical queries are then defined for 
addition to the patient list as '1' if the data correspond 
or '0' if they do not. Data sets created by these opera- 
tions are joined by 'union' and pivoted on a cross- 
tabulation list using statistical analysis software. We show 
an example of an SQL generated by the ERS in figure 3. 

Create reports for investigators to confirm 

To help investigators confirm the targeted patient list, 
reports are created by linking the findings for diagnostic 
imaging, pathological diagnosis, operations and other 
findings. Investigators confirm these entries using the 
reports and EMR information, including progress notes 
and images. When the diagnosis history, medication, 
laboratory results, progress notes and other information 
are necessary, the same operation is executed for each 
instance. For example, the list of radiological findings 
involves 'patient ID', 'study category', 'report name', 



'diagnosis', 'findings' and 'comment'. The reports 
may improve the investigators' confirmation efficiency 
because they prevent the need to refer to the medical 
records for each patient who needs confirmation. 

Confirmation by the investigator and execution 
of the statistical analyses 

The investigators confirm the accumulated data and 
execute the statistical analysis. In this test research, two 
oral and maxillofacial surgeons diagnosed cases by a 
chart review with an observation of imaging findings. 47 

Systemic evaluation 

To evaluate our system, we collected information about 
the research period using the recall method. For the 
accuracy of the data collected by the ERS, we evaluated 
the results after they were confirmed by the investigator. 

RESULTS 

Computable criteria, datasets and system evaluation 

We present the computable criteria in table 1. To 
increase data accuracy, we collected all of the exclusion 
criteria for the investigators to confirm. As table 1 shows, 
we extracted information from EMRs. For investigator 
confirmation, we also reported all targeted patients using 
the following lists: osteoporosis drugs administered, oral 
BP administered, intravenous BP administered, diabetes 
drugs administered, anticancer drugs administered, 
steroid drugs administered, osteoporosis diagnoses, oral 
cancer diagnoses, patients diagnosed with inflammation 
of the jaw, patients diagnosed with other suspicious dis- 
eases, patients diagnosed with diabetes, HbAlc values, 
radiological findings, pathological findings and radioiso- 
tope findings. These data were extracted from the ERS 
for statistical analyses, presented in CSV format, and ana- 
lysed using statistics software. 

Among the approximately 800 000 cases at our hos- 
pital, 8772 were categorised using the terms 'Inclusion 
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Table 1 Computable criteria for our test research 



Criterion 



Entity 



Operator 
symbol 



Attribute 



Operator 
symbol 



Parameter 



Created a targeted patient list 

Inclusion criteria: Diagnosis 
osteoporosis diagnosis 



Inclusion criteria: 
osteoporosis drug 
administrations 



Medications and 
injections 



and 
and 
and 

and 
and 



ICDIOCode 

DiagnosisDate 

DiagnosisDate 

SuspectedFlag 

DrugCode 

ExecuteDate 

ExecuteDate 



Added a flag for investigators to confirm the targeted patient list 



Exclusion criteria: oral 
cancer diagnosis 



Exclusion criteria: 
intravenous BP 
administrations 

Oral BP administrations 



Inflammatory jaw 
condition diagnosis 



Diagnosis 



Medications and 
injections 

Medications and 
injections 

Diagnosis 




Other suspicious disease Diagnosis 
diagnosis 



Diabetes diagnosis 



Steroid drug 
administrations 

Anticancer drug 
administrations 

Diabetes drug 
administrations 



Diagnosis 



Medications and 
injections 

Medications and 
injections 

Medications and 
injections 



HbA1 c test execution Laboratory test 



and 
and 
and 



and 
and 

and 
and 



and 
and 
and 



and 
and 
and 

and 
and 
and 

and 
and 

and 
and 

And 
And 



and 
and 



Created reports for confirmation by the investigators 



Radiological finding 
reports 

Pathological finding 
reports 

Radio isotope finding 
reports 



Diagnostic studies 
Diagnostic studies 

Or 

Diagnostic studies 



ICDIOCode 

DiagnosisDate 

DiagnosisDate 

SuspectedFlag 

DrugCode 

ExecuteDate 

ExecuteDate 

DrugCode 

ExecuteDate 

ExecuteDate 

ICDIOCode 

DiagnosisDate 
DiagnosisDate 
SuspectedFlag 
ICDIOCode 

DiagnosisDate 

DiagnosisDate 

SuspectedFlag 

ICDIOCode 

DiagnosisDate 

DiagnosisDate 

SuspectedFlag 

DrugCode 

ExecuteDate 

ExecuteDate 

DrugCode 

ExecuteDate 

ExecuteDate 

DrugCode 

ExecuteDate 

ExecuteDate 

Laboratory 

TestCode 

TestDate 

TestDate 

ReportName 

SampleName 
SampleName 



In 

>= 
<= 

in 

>= 
<= 

in 

>= 
<= 

in 

>= 

<= 

in 

>= 

<= 

in 

>= 
<= 

in 

>= 
<= 

in 

>= 
<= 

in 

>= 

<= 

in 

>= 

<= 

in 

>= 

<= 

in 

>= 
<= 

in 



contains 
contains 



(osteoporosis ICD10 code list) 

'10/01/2000' 

'09/30/2010' 

Fixed 

(osteoporosis drugs code list) 

'10/01/2000' 

'09/30/2010' 

(oral cancer ICD10 code list) 
10/01/2000' 
09/30/2010' 
Fixed 

(intravenous BP drugs code 
list) 

'10/01/2000' 
'09/30/2010' 
(oral BP drugs code list) 
'10/01/2000' 
'09/30/2010' 

(inflammatory conditions of 
jaws ICD10 code list) 
'10/01/2000' 
'09/30/2010' 
Fixed 

(other suspicious disease 
ICDIOcode list) 
'10/01/2000' 
'09/30/2010' 
Fixed 

(diabetes ICD10 code list) 
'10/01/2000' 
'09/30/2010' 
Fixed 

(steroid drugs code list) 
'10/01/2000' 
'09/30/2010' 

(anticancer drugs code list) 
'10/01/2000' 
'09/30/2010' 

(diabetes drugs code list) 
'10/01/2000' 
'09/30/2010' 
(HbA1c test code) 

'10/01/2000' 
'09/30/2010' 

(report name list of oral region) 

'bone' 
'jaw' 



BP, bisphosphonates; ID, identifications; ICD, International Classification of Diseases. 
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criteria: Osteoporosis diagnosis'; among this group, 
7195 were further categorised using 'Inclusion criteria: 
Osteoporosis drug administration'. We then calculated 
the time that had elapsed since the osteoporosis diag- 
nosis, determined that 7062 patients were aged 20 years 
or older, and created a targeted patient list. Among 
those on the targeted patient list, 23 patients were 
placed under the heading 'Exclusion criteria: Oral 
cancer diagnosis', 110 under 'Exclusion criteria: 
Intravenous BP administration', 4200 under 'Oral BP 
administration', 84 under 'Inflammatory jaw condition 
diagnosis', 2064 as 'Other suspicious disease diagnosis', 
1700 as 'Diabetes diagnosis', 4551 as 'Steroid drug 
administration', 904 as 'Anticancer drug administra- 
tions', 1055 as 'Diabetes drug administrations' and 3641 
as 'HbAlc test execution'. Because of the end point 
considered, patients who were classified under 
'Inflammatory jaw condition diagnosis' or 'Other suspi- 
cious disease diagnosis' were confirmed using prede- 
fined hierarchical diagnostic criteria by investigators 
who performed the statistical analyses and arranged the 
research results. We show the schema of data collection 
and confirmation as figure 4. 



The accuracy of the data extracted by the ERS was 
then characterised. Reviewing the medical records 
revealed that 2817 patients were not labelled as 'Oral BP 
administration', including seven (one who received 
intravenous BP) treated at other hospitals. Six patients 
had been treated with radiation therapy to the oral and 
maxillofacial regions. Among the 72 patients classified 
under 'Inflammatory jaw condition diagnosis', 35 cases 
and 37 non-cases were identified. 

The data extraction period lasted approximately 
3 months. Ten meetings were held during the protocol 
development stage to create and validate the computable 
criteria and the list of codes for various drugs and diagno- 
ses (ie, ICD-10). The time required for logical query defin- 
ition when using the ERS was approximately 20 h. The 
investigator confirmations and statistical analyses took 
approximately 4 months. 



DISCUSSION 

We identified eligible patients for this research and 
extracted the data necessary for confirmation by investi- 
gators and for statistical analyses. 



Figure 4 Schema of data 
collection and confirmation. 
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We asked the chart reviewers to evaluate the system in 
a questionnaire about 'the effect of computer program- 
ming support for data retrieval from the EMR', 'the 
result of the data retrieval', 'the positive and negative 
aspects of our ERS use' and 'the aspects of our method 
that should be improved'. The investigators evaluating 
the system mentioned that the following points: (1) the 
method enabled them to extract the necessary data for 
diagnosis and drug administration without exception; 
(2) by screening the entire patient population at the 
hospital using the ERS, they could identify not just eli- 
gible patients in the department of oral and maxillo- 
facial surgery but all eligible patients, which reduced the 
study bias and (3) by creating reports for confirmation, 
it enabled investigators to devote their time to reading 
images, thus effectively reducing the time required for 
reviewing medical records. The aspects of our method 
that should be improved are the 'lack of claim data' 
and the 'administrative complexity of EMR data use'. 
No negative aspects of our ERS use were noted. 

The ERS allowed for the collection of information on 
patient eligibility by efficiently combining clinical infor- 
mation. Although we did not compare our method with 
other methods, our proposed method reduced the 
labour normally required from investigators and 
improved the reliability of test research results, which 
indicated that it was useful. 

To design the ERS database, we designed a new data 
model optimised for patient identification. The main dif- 
ferences between our data model and the star schema 
were as follows: (1) demographic data, which were pre- 
sented in list form in our EMR system, were presented as 
a fact-less fact table and (2) date, time, measurements 
and text information were presented in dimension 
tables. 32 The most significant characteristic of our 
method for patient identification is the use of a specia- 
lised data model in clinical research and the ability to 
execute a large number of conversion tasks at the proto- 
col development stage. Data can be converted efficiently 
in a way that does not depend on the EMR database struc- 
ture when converting narrative criteria to computable cri- 
teria. In this research, we considered whether data were 
extracted directly from EMRs at the protocol develop- 
ment stage. However, EMR data were recorded in a 
sequential format for every medical practice, and the 
database structure was complicated. Comprehending the 
location and meaning of the necessary data thus required 
tremendous effort. It was difficult to make precise logical 
queries for patient identification. However, because our 
ERS data model was arranged by subjects (eg, tests, diag- 
nosis) , it was easy to interpret the available information. 
Due to the standardisation of computable criteria and 
SQL possible with the ERS, it was also possible to create 
computable criteria in little time. Additionally, verifying 
the patient identification accuracy was easy because it was 
possible to test each individual criterion. 

The SQL generated by our ERS does not reduce the 
time required for data retrieval. Our ERS also cannot 



retrieve information that is not in the data model. 
Current EMRs do not store all necessary data for clinical 
research, including information related to pregnancy, 
performance status, cancer stage, availability of transpor- 
tation to the hospital, specific tests that are not typically 
performed, drug regimen, outcomes (including death) 
and adverse events. Additionally, all tests are not admi- 
nistered to all patients, and necessary information may 
have been recorded in medical records at another hos- 
pital. 44 To facilitate EMR use in clinical research, it is 
necessary to accumulate as much of this information 
as possible. In the hospital, much of this information 
does not integrate well with EMRs, including test reports 
stored only in the departmental system. 50 However, it is 
important to utilise this information. Additionally, enab- 
ling ERS use in and across multiple institutions is also 
an important future task. 

Currently, most clinical research studies that use data 
from EMRs are planned according to the concept that 
the primary use of EMRs is for clinical practice and a 
secondary use is for clinical research. 44 Therefore, most 
investigators attempt to convert the text form eligibility 
criteria that already have been defined on a protocol to 
computable criteria at the data collecting stage. 35 36 
However, we propose that computable criteria should 
not be a result of the automated conversion of narrative 
criteria but rather a result of research preparation involv- 
ing medical concepts that are not expressed logically or 
explicitly in the narrative criteria. Some medical con- 
cepts may be interpreted differently depending on the 
research and the investigator caring for the patients. 
Additionally, current eligibility criteria are vague or 
complex, and they do not consider the use of the actual 
EMR. To convert computable criteria appropriately, 
high-level medical decisions to answer the research ques- 
tion are required. Therefore, we thought that a large 
amount of the conversion of the eligibility criteria to 
computable criteria should be executed at the protocol 
development stage. In addition, the conversion process 
should be divided into entity-level conversions that 
require higher medical decisions and attribute-level con- 
versions. To reduce the burden of conversion, it may be 
useful to apply NLP technology for the conversion from 
entity-level criteria to attribute-level criteria. Moreover, it 
is important to further discuss protocol standardisation, 
including eligibility criteria representation for comput- 
able use. For instance, the attribute-level criteria that 
describe the search conditions in detail may be useful in 
global studies to address diseases that vary according to 
the diagnostic criteria used in each country. 

Concerning EMR data accuracy, the ICD10 code 
(osteomyelitis of the jaw) sensitivity was 48.6% (35/72). 
The investigators reported six simple diagnosis errors, 
seven oral BP administrations at other hospitals, and 
six patients who were treated with radiation therapy 
in the oral and maxillofacial region. 47 For the accuracy 
of current EMRs, the investigators had to confirm 
the information. However, the EMRs provided rich 
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confirmation data and were useful in improving research 
data accuracy. In this study, we checked the data from 
actual EMRs manually and identified patients precisely 
and extensively using coded information, narrative infor- 
mation, and images. However, only information from 
existing EMRs was available. Current EMRs have a high 
degree of flexibility in data entry and are not currently 
managed for research purposes, which decreases their 
reliability. It is necessary to improve data quality through 
quality control without placing too much of a burden 
on clinical practice. Alternatively, it may be possible to 
organise data sufficiently before research use. 51-53 
Standardising the terminology and exchange formats 
used in the healthcare setting has facilitated inter- 
national discourse. 46 54-58 It is necessary to further 
discuss not only clinical practice but also research pur- 
poses, particularly how to utilise various standards when 
using EMRs beyond the hospital setting. 

CONCLUSION 

We propose a pragmatic method for EMR-based obser- 
vational studies. Our ERS is already used to support 
hospital-based cohort studies, clinical trial recruitment 
and the eClinical trial infrastructure 44 at our centre. We 
believe an efficient ERS and standardised data process- 
ing model are essential to facilitate clinical research that 
utilises EMRs. 
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